Claims: 

We claim. 



1. A computer-baseai method for identifying invariant peptide motifs useful as drug targets 
\ wherein the said method comprises the steps of: 
\ i) generating computatiof^lly overlapping peptide libraries from all the protein sequences of ihe 
^elected organisms available at http://www.ncbi.nhn.mh.gov, 
/ ii) sorting computationally me peptides of length 'N* obtained as above, alphabetically, according 



to single letter amino acid co 

lii) matching computationally common peptide sequences of the selected bacteria, 

iv) locating computationally thtese common peptides in the original proteins and subsequently 
labeUng them with their origin and location, 

v) joining computationally the ovarlapping common peptides to obtain a long chain of invariant 
peptide sequences, \ 

vi) annotating secondary structure of these conserved peptides from the crystal structure database, 

vii) comparing pathogenic strain genoifies against genomes of non-pathogenic strains and selecting 
the sequences not commonly conserved m these two groups, 

viii) validating computationally the invarTant sequence motifs as potential drug target sequence by 
searching for the given conserved sequences fn the host genome and rejecting the ones present in 
the host genome. 

2. The method of claim 1 wherein the length of^he sliding window of length 'N' ranges from 4 to 
any length of amino acid residues. 

3. The method of claim 1 v/herein the protein sequtoce data is taken from any organism but not 
specifically limited to microbes such as Mycoplasma pneumoniae, Helicobacter pylori, 
Hemophillus influenzae, Mycobacterium tuberculosis, l^ycoplasma genilalium. Bacillus subtillis. 
Escherichia coli. 
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4. A method as claimed in claim / where conserved peptide motifs as ideniified comprising: 



1. AAQSIGEPCTQLT 

2. AGDCTTTAT 

3. AGRHGNKG 

4. AHIDAGKTTT 

5. CPIETPEG 

6. DEPSIGLH 

7. DEFTSALD 

8. DEPTTALDVT 

9. DHAGIATQ 

10. DHPUGGGEG 

11. DLGGGTFD 

12. DVLDTWFSS 

13. ERERGITI 

14. ERGITITSAAT 

15. ESRRIDNQLRGR 

16. FSGGQRQR 

17. GEPGVGKTA 

18. GFDYLRDN 

19. GHNLQEHS 

20. GrDLGTTNS 

21. GINLLREGLI 

22. GIVGLPNVGJ 

23. GKSSLLNA 
24* GLTGRKIO 

25. GPFGTGl 

26. GPPGVGI 

27. GSGKTTIJL 

28. GTRIFGPV 

29. IDTPGHVDFT 

30. llAHmHGKSTL 

31. INGFGRIGR 

32. IREOGRTVG 

33. IVGESGSGKS 

34. KFSTTYATWWI 



35. KMSKSKGN 

36. KMSKSLGN 

37. KNMITGAAQMDGAILW 

38. KPNSALRK 

39. LFGGAGVGKTV 

40. LGPSGCGK 

41. LHAGGKFD 

42. LIDEARTPLnSG 

43. LLNRAFTLH 

44. LPDKAIDLIDE 

45. LPGKLADC 

46. LSGGQQQR 

47. MGHVDHGKT 

4S, NADFDGDQMAVH 

49. NGAGKSTL 

50. NLLGKRVD 

51. NTDAEGRL 

52. PSAVGVQPTLA 

53. QRVAIARA 

54. QRYKGLGEM 
55*. RDGLKPVHRR 

56. SALDVSIQA 

57. SGGLHGVG 

58. SGSGKSSL 

59. SGSGKSTL 

60. SVFAGVGERTREGND 

61. TGRTHQIRVH 

62. TGVSGSGKS 

63. TLSGGEAQRI 

64. TNKYAEGYP 

65. TPRSNPATY 

66. VEGDSAGG 

67. VRKRPGMYIG 




5. A 'method as clailned in claim 1 wherein the number of invariant peptides varies according to 
the rclatedness among the organisms and the number of organisms being compared. 

method as claimed irfvclaim 1 -4 wherein the invariant sequences belong to following protems 
available in the databai^e ^t-t-.p; //w^wnri ^ ^i .nim.nlh.Qov wherein the said list of proteins 
comprise: 

i dn a directeb rn a polymerase beta chain 

II excinucleaseVbc subunit a 

III excinuclease asc subunit b 

IV dna gyrase subunkt b 
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# 



V \atv synthase beta chain 

VI S\ADENOSYLMETHIONINE SYNTHETASE 

VII GLYCERALDEHYDE 3-PHOSPHATE DEHYDROGENASE 

vjri eloWgation factor g (EF-G) 

IX ELONGATION FACTOR TU (EF-TU) 

X 30S RIBOSOM AL PROTEIN S 1 2 

XI 508 RIBOSOM AL PROTEIN L 1 2 

XII SOS RIBOSOMAL PROTEIN L14 

XIII VALYL tRNA S'YNTHETASE (VALRS) 

XIV CELL DIVISON PROTEIN FtSH HOMOLOG 

XV DnaK PROTEIN (HSP70) 

XVI GTP BINDING PROTEAN LepA 
XVn TRANSPORTER 

XVni OLIGOPEPTIDE TRANSPORT ATP BINDING PROTEIN OPPF 

7. A method as claimed in claim 1 wherein flie said method of comparing the peptide libraries as 
given in step (ill) of claim 1 is carried out byvfoUowing the steps given in figure 1 . 

8. A method as claimed in claim 1 wherein the^said method of locating the common peptides in 
the original protein sequences as given m step\|iv) of claim 1 is carried out by following the 
steps given in figure 2. 

9. A method as claimed in claim 1 wherein the saidv method of creating a common peptide of 
variable length after removing the overlappings as giv^n in step (v) of claim 1 is carried out by 
following the steps given in figure 3. 

1 0. A microprocessortjased system for performing the methods of the invention which comprises: 

i) means of determinii^^e amino acid sequence window for creation of peptide library and 
subsequent origin tagging. 

ii) means of comparing the jjqptide library. 
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iii) locating compulationally ihes^ common peptides in the original proteins and subsequently 
labeling them with their origin and location, 

iv) joining computationally the overlapping common peptides lo obtain a long chain of invariant 
peptide sequences, V 

i I. A computer based system for perVbrming the methods of the invention flirther comprising a 
central processing unit, executing peptide library creating program (PEPLIB), peptide library 
matching program (PEPLIMP), p^tide stitching program (PEPSTITCH), peptide extraction 
program (PEPXTRACT) wherein the^said programs are all stored in a memory device accessed 
by the central processing unit connected to a display on which the central processing unit 
displays the screens of the above mentioned programs in response to user inputs with a user 
interface device. \ \ 

12. A method for assigning function to a pptein of unknown function showing no/weak homology 
to other protein sequences in a publicly\"avaitableaatabase (SWISSPROT) by employing the 

"'lU 




following steps: 

I. generating computationally 
of unknown function, \ 

II. sorting computationally the 



overlapping peptide library from the protein sequences 



window of amino acids) obtaine^as above, alphabetically, according to single letter 
amino acid code, 



|pep^tides^oriength 'N' (N is the length of the sliding 
vw^,,ainecLas 

III. matching computationally the current library with peptide library of all functionally 

\ \ 

known proteins to obtain conmion peptides, 

IV. locating computationally these common peptides in the original proteins and 
subsequently labeling them with their origin and location, 

V. joining computationally the overlapping common peptides to obtain a long chain of 
invariant peptide sequences, \ ^ 

VI. assigning function to the unknown pi^^ein based on the function of the protein with 
which maximum length of peptide sequence identity is found. The more is the 
number of matches with the proteins o^f similar function the likelihood of functional 
assignment will be higher. 
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